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In epidemiology, the person-years method is broadly used to estimate the 
incidence rates of health related events. This needs determination of time 
at risk stratified by period, age and sometimes by duration of disease or 
exposition. The article describes a fast method for calculating the time at 
risk in two- or three-dimensional Lexis diagrams based on Siddon's algorithm. 
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1 Lexis diagram and person-years method 

In epidemiology, oftentimes relevant events or outcomes simultaneously depend on dif- 
ferent time scales: age of the subjects, calendar time and duration of an irreversible 
disease. In event history analysis, [5], a useful concept is the Lexis diagram, which is 
a co-ordinate system with axes calendar time t (abscissa) and age a (ordinate). The 
calendar time dimension sometimes is referred to as period. Each subject is represented 
by a line segment from time and age at entry to time and age at exit. Entry and exit may 
^ \ be birth and death, respectively, or entry and exit in a epidemiological study or trial. 

There are excellent and extensive introductions about the theory of Lexis diagrams (see 
for example [3], [I], [1] and references therein), which allows to be short here. When it 
comes to irreversible diseases, the commonly used two-dimensional Lexis diagram with 
axes in time and age direction may be generalized to a three-dimensional co-ordinate 
system with disease duration d represented by the applicate (z-axis). If a subject does 
not get the disease during life time, the life line remains in the time-age-plane parallel 
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to the line bisecting abscissa and ordinate. With other words, the life line for the time 
without disease points in the (1,1,0) direction (where the triple (t,a,d) denotes the 
co-ordinates in time, age and duration direction, respectively). However if at a certain 
point in time E the disease is diagnosed, the life line changes its direction, henceforth 
pointing to (1, 1, 1). The situation is illustrated in Figure[TJ The life lines of two subjects 
are shown in the three-dimensional Lexis space. At time of birth (denoted B n , n = 1,2) 
both subjects are disease- free; both life lines go to the (1, 1,0) direction. The first sub- 
ject gets the disease at E, and henceforth the life line is parallel to (1,1,1) until death 
at D\. The second subject remains without the disease for the whole life, which ends at 

d + 




Figure 1: Three-dimensional Lexis diagram with two life lines. Abscissa, ordinate and 
applicate represent calendar time t, age a and duration d, respectively. The life 
lines start and end at birth B n and death D n , n = 1,2. The first subject gets 
the disease at E. Then, the life line changes its direction. The second subject 
does not get the disease, the corresponding life line remains in the t-a-plane. 



In order to measure the frequency of events in a population, such as onset of a chronic 
disease, the person-years methods records the number of people who are affected and 
the time elapsed before the event occurs. The person-years incidence rate A is estimated 
by 

A = -, (1) 

m 

where e is is the number of events and m is the number of person-years at risk jTJ p. 
250ff]. Calendar time and age often are important determinants for occurrence of events 
and have to be taken into account. This usually is achieved by dividing the subjects' 
time spent in the study into calendar time and age groups. Let eij be the number of 
events taking place while subjects are in time and age group Furthermore, let rriij 
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be the total time at risk spent in this group, then Equation (pQ) becomes 



In the planar Lexis diagram it is clear, how the time at risk rriij can be obtained. Let 
the time and age group (i, j) be defined by Cartesian product Sij := [U-i, U) x [ a j-ii a j)- 
Each subject whose life line intersects with the rectangle Sij contributes by its time at 
risk in Sij. To be precise, rriij is the sum of all the subjects' times at risk spent in Sij - . 

N 

71=1 

(n) 

where £)■ is the time at risk of subject n, n = 1, . . . , N, in the rectangle Sa- 

Again, these ideas can be generalized to three-dimensional case: Then, is the 
number of events taking place in the rectangular hexahedron 

Sijk '■= [U-i,ti) x [o-j-i,aj) x [dk-i,dk)- 
For the times at risk mijk it holds 

N 

m ijk = J2 £ ijl ( 3 ) 

n=l 

(n) 

where is the time at risk of subject n in volume element Sf/fc. 

Given a certain study population of size N, the question arises how the subjects' 

(n) 

contributions l\- k to the overall time at risk spent in S^k can be calculated. Since 
N may be large (up to several thousand), some attention should be paid to computation 
time. 

The solution is straightforward by noting that the question is very similar to the 
problem of following a radiological path through a voxel grid in tomography or raytracing 
in computer graphics. For both fields, tomography and raytracing, there is an ongoing 
research effort to efficiently discretize continuous lines (radiological paths or rays of 
light). This article has been inspired by the seminal work of Siddon, [B]. 

2 Intersecting life lines with voxels in the Lexis diagram 

Since the algorithm presented in this section is motivated from the field of computer 
tomography, some of the terminology is useful. Typically one of the sets S^k resulting 
from a partition of a rectangular hexahedron (right cuboid) into congruent volume ele- 
ments, is called a voxel. The six faces of each voxel are subsets of two adjacent planes 
parallel either to the i-a-plane, a-d-plane or t-d-plane. Hence, the voxel space comes 
along with a set of equidistant, parallel planes which are perpendicular to the abscissa, 
ordinate or applicate and which are defined by the union of all voxel faces. These planes 
play a crucial role in the algorithm. 
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In this article all voxels Syk are considered to be cubical, with all edges having the 
length t r , t r > : 



Sijk ■■= [tr •(»-!), tr ■ i) X [t r ■ (j - 1) , i r • j) X [t r • (fc — 1) , t r ■ fc) . (4) 

These voxels form a grid where the life lines of all subjects in the study are sorted into. 
As a consequence of cubical voxels, the temporal resolution with respect to calendar time, 
age and duration is the same. However, generalization to partitions usings rectangular 
voxels with height, length and depth being different is easily possible. 

The main idea for calculating the in the life line C n of subject n starting at 

entry point B n := (t^\a^\d^), ending at exit point D n := (i^^a^ ,d\ ), is the 
parameterization in the form 

C n : B n + a-(D n -B n ), a G[0,1]. 

Note, that — ijj = — o!q = d± — d^ =: At^ n \ Using this parameterization, 
all parameters E [0, 1] are calculated where an intersection with a voxel face takes 
place. Since the voxels are arranged in a regular grid, intersecting one of the voxel faces 
is equivalent with intersecting one of the t-a-, a-d- or t-d-pl&nes formed by the union of 
all voxel faces mentioned above. Hence, we calculate the intersections with these planes. 

Let us start with the a-d-planes (perpendicular to the i-axis): all those where an 
intersection with an a-ci-plane occurs are given by 

Jn) (u) = U-t r -(t ( o ) %t r ) (n) 

where % is the modulo-operator and £A n ) denotes the number of intersected a-<i-planes: 



(«) , 
1 



/tr i^/tr 



Similar formulas hold for those {v), v = 1, . . . , V^ n \ and a^\w), w = 1, . . . , W^ n \ 
where C n intersects the t-d- or i-a-planes, respectively. Now define the set 

A n := {al n \u) | u = l,..., U™} 

U {aW(«) \v = l,..., V™} (5) 
U {af\w) | w = l,...,W (n) }, 

which contains those € [0, 1] where an intersection occurs. Note that the three sets 
on the right-hand side of Equation ([5]) are not necessarily disjoint. Multiple values occur 
if an intersection happens to be on an edge or vertex of a voxel. Let ^4* := A n U {0, 1} 
be ordered ascendingly A* n = {a( n \p) \ p = l,...,P( n )} with = a( n )(l) < ••• < 
Q.W(pW) = 1. For calculating the and the associated voxel indices i,j,k, we have 
following algorithm: 

1. For each subject n, n = 1,...,N, calculate the set A* as above and sort the 
elements a^ n \p), p = 1, . . . , p( n \ in ascending order. 
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2. For p = 1,.. . ,P( n ) set 




£? w + g( n) (p)-(£n-£n) 



(6) 



where the division is taken componentwise. 



3. Then, calculate 



,(n) 




(7) 



For each voxel A;) summing up all the times (.™ k , n = 1, ■ ■ ■ ,N, by Equation (|3"j) 
yields the time at risk in period-, age- and duration class (i,j,k), which can be 
used in the person-years method. 

The idea of calculating the intersection points with the voxel faces goes back to Robert 
L. Siddon. The algorithm proposed by Siddon has been developed for raytracing in 
tomography, where several millions of paths have to be computed to form a radiological 
image. While the implementation provided with this article is not tuned for efficiency, 
remarkable speeding up is possible [2]. The execution of 500 runs of an R (The R 
Foundation for Statistical Computing) implementation of the algorithm with the data 
of 200 patients (equivalent to a total of 10 5 patients) on a 2.6 GHz personal computer 
takes 92 seconds. The (simulated) patient data set is described in more detail in the 
next section. 

3 Examples 

An implemtentation of the method in this article has been tested with simulated patient 
data. A study population of 200 subjects with entry age 55 to 85 born between and 15 
(TUs) suffering from a chronic disease for 3 to 15 TUs at is the time of entry is assumed 
to have the mortality rate 



Exit from the study is assumed to be only due to death (no censoring). The aim is to 
unfold the mortality rate from these patient data. 



m{a,d) = exp(-10 + 0.1 • a) ■ (1 + 0.1 • d). 
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For setting up the data set, following code has been used: 



for(patNr in 1:200){ 
thisPatlnAge 
thisPatBirth 
thisPatDuration 



<- runif(l, 55, 80) 
<- runif(l, 0, 15) 
<- runif(l, 3, 15) 



thisPatDeathAge 
patMatrix[patNr , ] 



<- fct_F (thisPatlnAge , thisPatDuration) 

<- round( which. min(runif (1) > F) + thisPatlnAge 

+ (runif (1) - 0.5) , 3) 
<- c (thisPatBirth, thisPatlnAge, thisPatDuration, 
thisPatDeathAge) 



For the simulation of the age of death, inverse transform sampling is used: 
(which. min(runif (1) > F) ). Therefor the cumulative distribution function F(t \ ao,do) 
for someone who enters the study at age ao and having got the disease for do TUs is 
calculated in the function f ct_F: 

F(t | a , d ) = 1 - exp f— J m(a + T,d + r)dr^ . 

If the alorithm is applied to this data set with t r = 5, it is possible to estimate the 
mortality rate m(a,d) by the person- years method. The result is shown in Figure [2J 
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Figure 2: Age- and duration specific mortality as estimated by the person-years method. 

4 Conclusion 

This article is about an extension of the person-years when period-, age- and duration- 
effects occur. The method to calculate the time at risk is based on raytracing techniques 
used in tomography and provides a fast way to follow the individual life lines of subjects 
in the Lexis diagram. The algorithm is able to treat two cases 

1. calculate the time at risk for newly incident cases, and 

2. calculate the time at risk, where the time elapsed after an event (e.g. duration 
since onset of a disease) is relevant. 

In the first case, the life lines are located in the t-a-plane, in the second the life lines 
are pointing to the (1,1,1) direction. The later case is important, when the duration 
is a covariable. This might be the case in late sequalae or mortality after having got a 
disease. Similarly, by interpreting the applicate axis (z-axis) as duration of exposure to 
a risk factor, the time at risk depending on period, age and duration of exposure can be 
calculated. 

In large populations or register data (log 10 N > 5 , times at risk are usually estimated 
by a formula going back to Sverdrup, [[1, Sect. 3.2.]. It is noted that using the method 
described in this article, estimation is no longer necessary, because given entry and exit 
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times of the subjects, calculation of times at risk is possible at feasible computational 
expense. 
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